On Staggered Checkpointing

نویسنده

  • Nitin H. Vaidya
چکیده

A consistent checkpointing algorithm saves a consistent view of a distributed application's state on stable storage. The traditional consistent checkpoint-ing algorithms require diierent processes to save their state at about the same time. This causes contention for the stable storage, potentially resulting in large overheads. Staggering the checkpoints taken by various processes can reduce checkpoint overhead 10]. This paper presents a simple approach to arbitrarily stagger the checkpoints. Our approach requires that the processes take consistent logical checkpoints, as compared to consistent physical checkpoints enforced by existing algorithms. Experimental results on nCube-2 are presented.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Scalable Algorithm for Compiler-placed Staggered Checkpointing

To make progress in the face of possible system failures, long-running parallel applications often checkpoint, or save their state, so they can resume execution. Many current checkpointing techniques require user input, impose run-time performance penalties, or result in all processes checkpointing synchronously which leads to network and file system contention, again resulting in significant p...

متن کامل

Staggered Consistent Checkpointing

ÐA consistent checkpointing algorithm saves a consistent view of a distributed application's state on stable storage. The traditional consistent checkpointing algorithms require different processes to save their state at about the same time. This causes contention for the stable storage, potentially resulting in large overheads. Staggering the checkpoints taken by various processes can reduce c...

متن کامل

Some Thoughts on Distributed Recovery ( preliminary

This report deals with some aspects of distributed recovery. The report is divided into multiple parts, each part introducing a problem and a solution. The intent of this report is to present a medley of preliminary ideas, more detailed treatment may be presented elsewhere. The report deals with the following problems: A single processor failure tolerance scheme based on the distributed recover...

متن کامل

Some Thoughts on Distributed Recovery ( preliminary version )

This report deals with some aspects of distributed recovery. The report is divided into multiple parts, each part introducing a problem and a solution. The intent of this report is to present a medley of preliminary ideas, more detailed treatment may be presented elsewhere. The report deals with the following problems: A single processor failure tolerance scheme based on the distributed recover...

متن کامل

Consistent Logical Checkpointing

A \consistent checkpointing" algorithm saves a consistent view of the distributed system state on stable storage. The loss of computation upon a failure can be bounded by taking consistent checkpoints with adequate frequency. The traditional consistent checkpointing algorithms require the diierent processes to save their state at about the same time. This causes contention for the stable storag...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996